首页> 外文OA文献 >Automatically Tuning Parallel and Parallelized Programs
【2h】

Automatically Tuning Parallel and Parallelized Programs

机译:自动调整并行程序和并行程序

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In today’s multicore era, parallelization of serial code is essential in order to exploit the architectures performance potential. Parallelization, especially of legacy code, however, proves to be a challenge as manual efforts must either be directed towards algorithmic modifications or towards analysis of computationally intensive sections of code for the best possible parallel performance, both of which are difficult and time-consuming. Automatic parallelization uses sophisticated compile-time techniques in order to identify parallelism in serial programs, thus reducing the burden on the program developer. Similar sophistication is needed to improve the performance of hand-parallelized programs. A key difficulty is that optimizing compilers are generally unable to estimate the performance of an application or even a program section at compile time, and so the task of performance improvement invariably rests with the developer. Automatic tuning uses static analysis and runtime performance metrics to determine the best possible compile-time approach for optimal application performance. This paper describes an offline tuning approach that uses a source-to-source parallelizing compiler, Cetus, and a tuning framework to tune parallel application performance. The implementation uses an existing, generic tuning algorithm called Combined Elimination to study the effect of serializing parallelizable loops based on measured whole program execution time, and provides a combination of parallel loops as an outcome that ensures to equal or improve performance of the original program. We evaluated our algorithm on a suite of hand-parallelized C benchmarks from the SPEC OMP2001 and NAS Parallel benchmarks and provide two sets of results. The first ignores hand-parallelized loops and only tunes application performance based on Cetus-parallelized loops. The second set of results considers the tuning of additional parallelism in hand-parallelized code. We show that our implementation always performs near-equal or better than serial code while tuning only Cetus-parallelized loops and equal to or better than hand-parallelized code while tuning additional parallelism.
机译:在当今的多核时代,串行代码的并行化对于利用架构的性能潜力至关重要。然而,尤其是对遗留代码的并行化已证明是一个挑战,因为必须进行手动操作,要么针对算法修改,要么针对代码的计算密集型部分进行分析,以实现可能的最佳并行性能,这既困难又费时。自动并行化使用复杂的编译时技术来识别串行程序中的并行性,从而减轻了程序开发人员的负担。需要类似的技巧来提高手并行程序的性能。关键困难在于,优化编译器通常无法在编译时估计应用程序甚至程序段的性能,因此,性能改进的任务始终由开发人员承担。自动调整使用静态分析和运行时性能指标来确定最佳编译时间方法,以实现最佳应用程序性能。本文介绍了一种离线调整方法,该方法使用源到源并行化编译器Cetus和调整框架来调整并行应用程序性能。该实现使用一种称为合并消除的现有通用调整算法来研究基于测得的整个程序执行时间来串行化可并行化循环的效果,并提供并行循环的组合作为结果,以确保等于或改善原始程序的性能。我们根据来自SPEC OMP2001和NAS并行基准的一系列手动C基准对算法进行了评估,并提供了两组结果。第一个忽略手动并行循环,仅基于Cetus并行循环调整应用程序性能。第二组结果考虑了手工并行化代码中其他并行性的调整。我们表明,在仅调整Cetus并行化的循环时,我们的实现总是执行接近于或优于串行代码的结果,而在调整其他并行性时,则等于或优于手工并行化的代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号